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OPTIMAL TANGENT PLANE RECOVERY FROM NOISY 
MANIFOLD SAMPLES 

By Daniel N. Kaslovsky* and FRANgois G. Meyer"!" 
University of Colorado, Boulder 

Constructing an efficient parametrization of a large, noisy data 
set of points lying close to a smooth manifold in high dimension re- 
mains a fundamental problem. One approach consists in recovering 
a local parametrization using the local tangent plane. Principal com- 
ponent analysis (PCA) is often the tool of choice, as it returns an 
optimal basis in the case of noise-free samples from a linear sub- 
space. To process noisy data, PCA must be applied locally, at a scale 
small enough such that the manifold is approximately linear, but at a 
scale large enough such that structure may be discerned from noise. 
Using eigenspace perturbation theory, we study the stability of the 
subspace estimated by PCA as a function of scale, and bound (with 
high probability) the angle it forms with the true tangent space. By 
adaptively selecting the scale that minimizes this bound, our analysis 
reveals the optimal scale for local tangent plane recovery. 

1. Local Tangent Plane Recovery. 

1.1. Introduction. Large data sets of points in high-dimension often he 
close to a smooth low-dimensional manifold. A fundamental problem in pro- 
cessing such data sets is the construction of an efficient parameterization that 
allows for the data to be well represented in fewer dimensions. Such a pa- 
rameterization may be realized by exploiting the inherent manifold structure 
of the data. However, discovering the geometry of an underlying manifold 
from only noisy samples remains an open topic of research. 

The case of data sampled from a linear subspace is well studied (see 
[14, 16, 24], for example). The optimal parameterization is given by princi- 
pal component analysis (PCA), as the singular value decomposition (SVD) 
produces the best low-rank approximation for such data. However, most 
interesting manifold-valued data organize on or near a nonlinear manifold. 
PCA, by projecting data points onto the linear subspace of best fit, is not 
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optimal in this case, as curvature may only be accommodated by choos- 
ing a subspace of dimension higher than that of the manifold. Algorithms 
designed to process nonlinear data sets typically proceed in one of two direc- 
tions. One approach is to consider the data globally and produce a nonlinear 
embedding. Alternatively, the data may be considered in a piecewise-linear 
fashion and linear methods such as PCA may be applied locally. The latter 
is the subject of this work. 

There have been several versions of localized PCA for tangent plane re- 
covery proposed in the literature. While the need for locality has been ac- 
knowledged, a precise treatment of the size of the local neighborhood is 
often not addressed. The appropriate neighborhood size must be a function 
of intrinsic (manifold) dimensionality, curvature, and noise level. Despite the 
fact that these properties may change as different regions of the manifold 
are explored, locality is often defined via an a priori fixed number of neigh- 
bors or as the output of an algorithm (e.g., [26, 35, 17, 33]). Other methods 
[2, 25, 19] adaptively estimate local neighborhood size but are not equipped 
with optimality guarantees. 

The selection of the optimal scale, or neighborhood size, for local tangent 
plane recovery is the key contribution of this paper. What is novel about 
our approach is that we use the geometry of the data to guide our defini- 
tion of locality. On the one hand, a neighborhood must be small enough so 
that it is approximately linear and avoids curvature. On the other hand, 
a neighborhood must be be large enough to overcome the effects of noise. 
We use eigenspace perturbation theory to study the stability of the tangent 
plane as the size of the neighborhood varies. We bound, with high proba- 
bility, the angle between the recovered linear subspace and the true tangent 
plane. In doing so, we are able to adaptively select the neighborhood that 
minimizes this bound, yielding the best approximate tangent plane. Further, 
the behavior of this bound demonstrates the non-trivial existence of such 
an optimal scale. We are also able to accurately and efficiently estimate the 
curvature of the local neighborhood. Finally, we introduce a geometric un- 
certainty principle quantifying the limits of noise-curvature perturbation for 
tangent plane recovery. 

Our approach is similar to the analysis presented by Nadler in [24], who 
studies the finite-sample properties of the PCA spectrum. Through matrix 
perturbation theory, Nadler examines the angle between the leading finite- 
sample-PCA eigenvector and that of the leading population-PCA eigenvec- 
tor. As a linear model is assumed, perturbation results from noise only. 
Despite this key difference, the two analyses utilize similar techniques to 
bound the effects of perturbation on the PCA subspace and our results re- 
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cover those of Nadler in the curvature-free setting. Nadler also reports that 
sample-PCA suffers from a sudden "loss of tracking" of the true dominant 
eigenvector due to a crossover between signal and noise eigenvalues. We 
demonstrate a similar phenomenon, owing to geometry rather than noise. 
The present work therefore generalizes the study of Nadler to noisy samples 
from a nonlinear manifold model. 

Other recent related works include that of Singer and Wu [29], who use 
local PC A to build a tangent plane basis and give an analysis for the neigh- 
borhood size to be used in the absence of noise. Using the hybrid linear 
model, Zhang, et al. [34] assume data are samples from a collection of "flats" 
(affine subspaces) and choose an optimal neighborhood size from which to 
recover each flat by studying the least squares approximation error in the 
form of Jones' /3-number (see [15] and also [8] in which this idea is used for 
curve denoising). An analysis of noise and curvature for normal estimation 
of smooth curves and surfaces in and is presented by Mitra, et al. [22] 
with application to computer graphics. We also note the work of Maggioni 
and coauthors [4], in which multiscale PC A is used to discover the intrinsic 
dimensionality of a data set. 

The paper is organized as follows. The remainder of this section provides 
the intuition and assumptions of our approach and introduces the geometric 
model that is used throughout this work. We frame the problem as one of 
subspace perturbation in Section 2 and study the size of the perturbation 
as a function of scale in Section 3. The selection of the optimal scale is our 
main result and is presented in Section 4, along with the necessary geometric 
conditions for tangent plane recovery. Numerical results are given in Section 
5. We conclude with algorithmic considerations and a discussion of future 
directions in Section 6. 

1.2. Problem Setup. Our goal is to recover the best approximation to 
a local tangent space of a nonlinear d-dimensional Riemannian manifold 
A4 from noisy samples presented in dimension D > d. Working about a 
reference point xq, an approximation to the linear tangent space of Ai at 
xq is given by the span of the top d singular vectors of the centered data 
matrix (where "top" refers to the d singular vectors associated with the d 
largest singular values). The question becomes: how many neighbors of xq 
should be used (or in how large of a radius about xq should we work) to 
recover the best approximation? 

To answer this question, we examine the "noise-curvature trade-off." Given 
noisy samples of a linear subspace, the quality of PCA approximation im- 
proves as more points are included. However, the curvature of Ai prevents 



4 



D. N. KASLOVSKY AND F. G. MEYER 



the inclusion of a large number of points. Similarly, there exists a local scale 
about xq such that the effects of curvature are small, as Ai locally resembles 
Euclidean space. This suggests allowing only a very small radius about xq, 
yet at small scales, the sample points are indistinguishable from noise. We 
therefore seek a balance and assume there exists a scale large enough to be 
above the noise level, but still small enough to avoid curvature. This scale 
reveals a linear structure that is sufficiently decoupled from both the noise 
and the curvature to be well approximated by a tangent plane. We note 
that the concept of noise-curvature trade-off has been a subject of interest 
for decades in dynamical systems theory [9]. 

1.3. Geometric Data Model. A d-dimensional manifold of codimension 1 
may be described locally by the surface y = f{ii,---,id), where ii is a 
coordinate in the tangent plane. After translating the origin, a rotation 
of the coordinate system can align the coordinate axes with the principal 
directions associated with the principal curvatures at the given reference 
point Xq. Aligning the coordinate axes with the plane tangent to Ai at xq 
gives a local quadratic approximation to the manifold. Using this choice of 
coordinates, the manifold may be described locally [11] by the Taylor series 
of / at the origin xq: 

(1.1) y = fiii, ...,£d) = li^^iil + ■■■ + ^d£l) +o{il + ---+£l), 

where ki, . . . , are the principal curvatures of Ai at xq. In this coordinate 
system, xq has the form 

X0 = [h i2 ■■■ id f{h,...Jd)f 

and points in a local neighborhood of xq have similar coordinates. Gener- 
alizing to a d-dimensional manifold of arbitrary codimension in M^, there 
exist {D — d) functions 

m) = li4^il + --- + 4^il) + o{ii + ---+fd) 

for i = {d + 1), . . . ,D, with . . . , k^*^ representing the principal curva- 
tures in codimension i at xo- Then, given the coordinate system aligned 
with the principal directions, a point in a neighborhood of xq has coordi- 
nates [-^1, . . . ,idi fd+ii ■ ■ • ) /d]- We truncate this Taylor expansion and use 
the quadratic approximation 

(1.2) m) = \{4^il + ---+^fil\ 
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i = {d+ 1), . . . ,D, as the local model for our analysis. 

Consider now discrete samples from Ai that are contaminated with an 
additive Gaussian noise vector e drawn from the M (O, (J^Id) distribution. 
Each sample x is a D-dimensional vector and N such samples may be stored 
as columns of a matrix X E W^^^ . The coordinate system above allows the 
decomposition of x into its linear (tangent plane) component £, its quadratic 
(curvature) component c, and noise e, three D-dimensional vectors 

(1.3) l=[lil2 ■■■ id ^ ■■■ 0]^ 

(1.4) c = [0 ... Crf+i . . . CDf 

(1.5) e = [ei 62 ... eo]^ 

such that the last {D — d) entries of c are of the form 

(1.6) c, = \{4^il+ ... +^fil). 

We may store the N samples of c, and e as columns of matrices L, C, E, 
respectively, such that our data matrix is decomposed as 

(1.7) X = L + C + E. 

Remark. Of course it is unrealistic for the data to be observed in the 
described coordinate system. As noted, we may use a rotation to align the 
coordinate axes with the principal directions associated with the principal 
curvatures. Doing so allows us to write (1.2) as well as (1.7). Because we 
will ultimately quantify the norm of each matrix using the unitarily-invariant 
Frobenius norm, this rotation will not affect our analysis. We therefore pro- 
ceed by assuming that the coordinate axes align with the principal directions. 

The true tangent plane we wish to recover is given by the PCA of L. 
Because we do not have direct access to L, we work with X as a proxy, 
and instead recover a subspace spanned by the corresponding eigenvectors 
of XX'^ . We will study how close this recovered invariant subspace of XX'^ 
is to the corresponding invariant subspace of LL^ as a function of scale. 
Throughout this work, scale refers to the number of points in the local 
neighborhood within which we perform PCA. Given a fixed density of points, 
scale may be equivalently quantified as the radius r about the reference point 
xq defining the local neighborhood. 

2. Perturbation of Invariant Subspaces. Given the decomposition 
of the data (1.7), we have 

(2.1) XX^ = ll'^+cc^+ee^+lc^+cl'^+le^+el'^+ce'^+ec^. 
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To account for the centering required by PCA, define tlie sample mean of 
N realizations of random variable Y as 

1 ^ 

(2.2) E[y] = -5]y«, 

i=l 

where y^*^ denotes the ith realization. Let the mean of a matrix M be the 
matrix E[M] such that each entry of row i is the sample mean of the ith 
row of M. Let M denote the centered version of M: 

(2.3) M = M-E[M]. 
Thus we have 

(2.4) XX^ = ll'^+cc^+ee^+lc^+cl^+le^+el'^+ce^+ec^. 

The problem may be posed as a perturbation analysis of invariant subspaces. 
Rewrite (2.1) as 

(2.5) 1x1^ = 1lL^ + A, 
where 

(2.6) A = {CC^ + EE^ + LC^ + CL^ + LE'^ + EL^ + CE^ + EC^) 

is the perturbation that prevents us from working directly with LL^. The 
dominant eigenspace of XX'^ is therefore a perturbed version of the domi- 
nant eigenspace of LL-^. Seeking to minimize the effect of this perturbation, 
we look for the scale A^* at which the dominant eigenspace of XX'^ is closest 
to that of LL^ . Before proceeding, we review material on the perturbation 
of eigenspaces relevant to our analysis. The reader familiar with this topic 
is invited to skip directly to Theorem 2.1. 

The distance between two subspaces of R-^ can be defined as the spectral 
norm of the difference between their respective orthogonal projectors [12]. As 
we will always be considering two equidimensional subspaces, this distance 
is equal to the sine of the largest principal angle between the subspaces. 
We state our results in terms of the Frobenius norm as it will provide a 
simplification of Theorem 2.1. Then, by the equivalence of norms, we may 
define the optimal scale N* as 

(2.7) A* = argniin||P-P||ir, 
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where P and P are the orthogonal projectors onto the subspaces computed 
from L and X , respectively. The solution to (2.7) is the main goal of this 
work. 

The distance ||P — may be bounded by the classic sinG theorem of 
Davis and Kahan [6]. We will use a version of this theorem presented by 
Stewart (Theorem V.2.7 of [30]), modified for our specific purpose. First, we 
establish some notation, following closely that found in [30]. Consider the 
eigendecompositions 

(2.8) ^LL^ = UAU^ = [Ui U2] A [Ui U2f, 

(2.9) ^XX^ = UW^ = [Ui U2] A [Ui U2f^ 

such that the columns of U are the eigenvectors of j^LL^ and the columns of 
U are the eigenvectors of j^XX'^ . The columns of Ui are those eigenvectors 
associated with the d largest eigenvalues in A arranged in descending order. 
The columns of U2 are then those eigenvectors associated with the smallest 
{D — d) eigenvalues, and U is similarly partitioned. The subspace we recover 
is spanned by the columns of Ui and we wish to have this subspace as close as 
possible to the tangent space spanned by the columns of C/i. The orthogonal 
projectors onto the tangent and computed subspaces, P and P respectively, 
are given by 

p = UiUf and P = UiU^. 

Define to be the dth largest eigenvalue of -^LL?" , or the last entry on 
the diagonal of Ai. Note that corresponds to variance in a tangent plane 
direction. 

We are now in position to state the theorem. Note that we have made 
use of the fact that the columns of U are the eigenvectors of LL^, that 
Ai,A2 are Hermitian (diagonal) matrices, and that the Frobenius norm is 
used to measure distances. The reader is referred to [30] for the theorem in 
its original form. 

Theorem 2.1. (Davis & Kahan [6], Stewart [30]) 

Lets = Xd — \\U'f ^Ui^p — ^U2\\p and consider 

• ( Condition 1) J > 

• (Condition 2) \\Ul AU2\\p\\U^ /^Ui\\p < ^J^. 
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Then, provided that conditions 1 and 2 hold, 



(2.10) 



P-P < 2V2 " ^ , — ^. 



The two conditions of the theorem have important geometric interpreta- 
tions. Informally, condition 1 requires that the linear structure we seek to 
recover be sufficiently decoupled from both the noise and curvature (this 
is consistent with our assumption of the existence of a scale yielding suf- 
ficient decoupling). We may consider to be the condition number for 
subspace recovery. When 5 approaches zero, the condition number becomes 
large, and bound (2.10) loses meaning as we cannot recover an approximat- 
ing subspace. In Section 4 we will see that condition 1 naturally gives rise to 
an uncertainty principle that quantifies the limits of noise-curvature pertur- 
bation for tangent plane recovery. We will also see that the second condition 
naturally implies that the manifold be sufficiently sampled. 

The solution to (2.7) is impractical to compute. However, (2.10) is a tight 
bound, as will be demonstrated by the experiments (Section 5). Thus, a so- 
lution may be approximated by minimizing the right-hand side of (2.10). To 
do so, and to give each quantity in the theorem a geometric interpretation, 
we must first understand the behavior of the perturbation A as a function 
of the scale parameter A^. 

3. Bounding the Effects of Noise and Curvature. In this section 
we study the behavior of each term in (2.6) as a function of the scale pa- 
rameter A^. First, we provide insight as to their leading order behavior. As 
explained by Pukunaga [10] (Chapter 5), estimator bias and estimator vari- 
ance depend on the Hessian and gradient, respectively, of the function being 
estimated. Consider the local manifold model (1.2). This second order ap- 
proximation is presented in a coordinate system such that its gradient is 
zero and its Hessian is a diagonal matrix with the principal curvatures as its 
entries. We therefore expect perturbation terms associated with variance to 
tend to zero as the scale parameter A^ increases. Likewise, we expect pure 
curvature terms to grow with A^. Formal calculations will show that jjCC'^ , 
the term associated purely with curvature, has nonzero expectation that 
increases with N. Note that while the diagonal entries of jjEE'^ also have 
nonzero expectation, these terms do not grow with A^ and are therefore as- 
sociated with a noise-floor rather than with estimator bias. All other terms 
in (2.6) are zero in expectation, and thus only carry variance. Accordingly, 
these terms decay as 1/ \/iV- 
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3T. Preliminaries. 

3.1.1. Sampling a Linear Subspace. Consider sampling a linear subspace 
by uniformly sampling points inside B^^{r), the (i-dimensional ball of radius 
r centered at xq. We drop the dependence on xq from our notation for the 
remainder of this analysis. Because we are sampling from a noise-free linear 
subspace, the number of points captured inside B'^{r) is a function of the 
sampling density p: 



where Vd is the volume of the d-dimensional unit ball. As we wish to maintain 
a local analysis, we must enforce that r be small. To make this explicit, 
denote by rmax the largest radius within which the local model (1.2) holds 
and compute the number of points captured in B'^{rmax)'- 



Remark. Equation (3.3) suppresses the dependence on sampling that 
is captured by the pvd term. Note that because r is small, the sampling 
density p may have to be large to allow for large A^, as is explicitly seen 
in equation (3.1). The volume of the unit ball, Vd, is very small for even 
reasonable values of d, further necessitating a large sampling density. The 
analysis in this section may be performed entirely in the context of equation 
(3.1) provided that r is taken to be small. In doing so, the dependence on 
sampling density p is clear in all steps. We prefer to instead perform the 
analysis in the context of the rescaled equation (3.3), explicitly forcing the 
analysis to the local scale by considering r to be a fraction of the largest 
radius allowed by the local model. 

3.1.2. Notation. In this section and throughout the remainder of this 
work, we will make use of the following definitions involving the principal 
curvatures: 



(3.1) 



N = pvdr"^, 



(3-2) Nmax = pVdrtax- 

Then rescale (3.1) by dividing by (3.2) and solve for r: 



(3.3) 




d 



(3.4) 




71=1 



(3.5) 
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(3-6) ^nn ~ ^r}'^^\ ^Jin ~ '^m^n ''- 

n=l m,n=l 

The constant Ki quantifies the curvature in codimension i, for i = [d + 1), 
. . . ,D. Note that given our choice of coordinate system in Section 1.2, Ki 
is the trace of the Hessian in the ith codimension. The overall curvature of 
our local model is quantified by K and is a natural result of our use of the 
Frobenius norm. We note that KiKj = Knn + Kmn- 

By the choice of coordinate system, U2 is the D x (D — d) matrix whose 
columns are the last {D — d) columns of Id, the identity matrix of order D. 
Due to the specific form of each matrix, we have UfC, C'^Ui, U2L, and 
L^U2 are all zero matrices of the appropriate size. Because A is a symmetric 
matrix, we have that ||f72'^^i||i7' — ||f^i"^^2||^- Finally, we will work with 
projections of vector a onto Ui and where a takes the form of £, c, or e 
(equations (1.3)-(1.5)), and denote such projections by 

(3.7) U^a = au,, for p = {1,2}. 

3.2. Analysis of Perturbation Terms. We begin by presenting our general 
strategy for bounding terms of the form \\Up jjAB^Uq\\F for p,q = {1,2} 
where A and B are general matrices of size D x N. The key observation is 
that jfAB^ is a sample mean of N outer products of vectors a and b, each 
sampled from a given distribution: 

^AB^ = E[ia-E[a]){b-mf], 

where E[y] is the sample mean defined in (2.2). We therefore expect that 
j^AB'^ win converge toward the centered outer product of a and b. 

We will use the following result of Shawe- Taylor and Cristianini [28] to 
bound, with high probability, the norm of the difference between this sample 
mean and its expectation, 

(3.8) E[Uj{a-E[a]){b-E[b] fUg] - E[U^ {a -E[a]){b -E[b]fUg] 

where E\Y] is the expectation of the random variable Y £ y. 

Theorem 3.1. (Shawe- Taylor & Cristianini, [28]). Given N samples of 
a random variable Y generated independently at random from y according 

2 

to the distribution Py, with probability at least 1 — e"^ over the choice of 
the samples, we have 

R 



(3.9) ||E[y] -i[y] < — (2 + 77^2) 
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where R = sup5„pp(py) II^IIf and supp^Py) is the support of distribution Py- 



Remark. With a slight abuse of notation, we note that the "Frobenius 
norm of a vector" is equivalent to the vector's Euclidean norm, and thus we 
use II • ||i? for both matrices and vectors. 

Remark. The choice of R in (3.9) need not be unique. Our analysis will 
proceed by using upper bounds for ||y||i^ which may not be suprema. 

Continuing from (3.8), 

E[C/J(a-E[a])(6-E[6])^C/g] - ¥.[U^ {a -¥.[a]){h -¥.[h\fUq 

T 



(3.10) 



< 



Because W] =0 and E[el 



+ 



T 



0, E[aup]E[6„^] is nonzero only for the case 



[a = h = c,p = q = 2). In this case, ¥.[U^ (a - E[a])(6 - E[6])^ Uq] = 

E[C/2"(c — IE[c])(c — E[c])"^C/2] is an empirical covariance matrix. As shown 
in [28], such a matrix is unchanged when the origin is shifted by a fixed 
translation. Therefore we may assume that the origin has been shifted to 
the center of mass of the distribution and we may take E[cu2] ^['^m2] *° 
be zero. Note that we may only do so in the context of this calculation, and 
in general E[cm2] ^^"^ ^['^^2] ^'^^ nonzero. Then for all choices of {a,b,p,q), 
we have E[aup] E[6^^] = and the right-hand side of (3.10) becomes 



T 



(3.11) 



< 



E[a„^6^J-E[a„^6^ 



T 



+ 



EKJ-E[aU ^ E[6n-E[6^ 



We now use Theorem 3.1 to bound each of the three terms in (3.11). 
For this analysis, the random variable Y in Theorem 3.1 takes one of the 
following two forms: 

uT 



Y = auXu. 



or Y 



for p^q = {1,2}. Thus there are two corresponding definitions for R: 
(3.12) 

(3.13) Rl 



sup 

SUpp(Pa) 

supp(Pb) 

sup 

SUpp(Pa) 



\0'uXq\\F 



\aup\\F 
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where a and b are sampled according to distributions Pa and Pf,, respectively. 
Directly applying Theorem 3.1 to each of the three terms in (3.11) and using 
a standard union bound argument yields 



+ 



(3.14) 



< 



ah 



2 + 7]abV2) + 



RaRl 



N 



EKJ-EKJ ^ E[bl]-E[bl 



2 + 1] 

a ^2 ) (2 + 



with probability greater than 



(3.15) 



1 



e '"6 



over the random sampling of a and b. For the case that a = b we instead 
simply have the result holding with probability greater than 



(3.16) 



1 



Vaa 



over the random sampling of a. The probability constants may be chosen 
to ensure such an event holds with high probability. For example, in (3.15), 
letting rjab = i]a = rjb = rj, we have probability greater than 0.9451 for r/ = 2 
and greater than 0.9996 for ij = 3. 



Putting it all together, we have that 

\\E[U^{a-E[a]){b-E[b]fUg]\\F - \\E[U^ {a - E[a]){b - E[b]fUg]\\F 
(3.17) 



< 



ab 



2 + r]abV2 



RaRl 

N 



2 + rj 

a 

V2 ) (2 + rjb^/2 



and we may conclude that 

1 



(3.18) 



T 



N 



/X - r, ^ + r 



where n 
and r 



Rab 



2 + r]abV2) + 



F 

N 



2 + riaV2) (2 + r]hV2) , 



with probability greater than 



e '^b for a 7^ 6 
for a = 6 
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over the random sampling of a and b. 

Before computing the constants and Ra, we must ensure that either 
the suprema (3.12) and (3.13) exist or that finite bounds may be given in 
place of suprema. Noting that 



(3.19) 



< \\a„ 



\bu. 



\eu2\\ < cr (VD -d + CeV2 



it suffices to show that the projections iup, Cu^, and are bounded. The 
vectors i and c are functions of the coordinates of points drawn uniformly 
from B'^{r). Therefore their entries are bounded, as are the norms of their 
projections. The entries of e, while unbounded in general, are Gaussian ran- 
dom variables and are therefore bounded over a set with large measure. Let 
Vlp be the set in for which both 



(3.20) 
(3.21) 



hold. A formal construction of fig is given in Appendix A and the measure 
7£)(r2e) of this set is shown to be large, 

7D(f^e) > l-2e-«'. 

Therefore, on rig, we may state bounds for i?§ (p = 1,2). All results involving 
projections of the vector e will be given over this set. We apply Theorem 
3.1 by conditioning on e G Og. If we consider the joint probability of the 
independent random variables a and e, then we can estimate the probability 
that the deviation of the outer-product of and e^^ from its expectation 
is large. This probability is given by: 



Prob 
= Prob 
+ Prob 



I nvl] - n^upeUy > ^(2 + r/aeV2) 



N 
r>pq 



I EK^e^J - EK^e^JII^ > ^(2 + r?, 



IEK<]||f> 



npq 



(2 + r/„ 



Prob 
Prob 



e £ Qe 



< (e-''-)(l-2e-«') + (l)(2e~«' 
= e"''- + 2e-^- - 2e-^-e-«' 

Thus we have that 



(3.22) 



T 



< 



2 + r]aeV2^ 
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2 c-2 2 c-2 

with probability greater than 1 — e~'''"= — 2e~^^ +2e~'''"=e~^'= over the random 
samphng of a and random reahzation of e, and an identical calculation holds 



when applying the theorem to 
(3.15) can be bounded by 



. The probability given in 



1 

> 1 



2e ^= + 2e ^=(e + e ''^ 
2e-«^' 



(3.23) 

and when {a = h = e) the probability given in (3.16) can be bounded by 



(3.24) 



1 

> 1 



2e" 
2e" 



+ 2e-«'(e-'?- +e-^') 



where we have neglected the positive contribution of the product of expo- 
nentially small terms. 

3.2.1. Suprema R^l and Ra- We now compute bounds for the Ra terms. 
Simple norm calculations give 



^ui \ \f 



< r2 

/ J — max 
i=\ 



N 



N„ 



-U2 1 1 F 



(3.25) 




< 



2^4 



N 



where we have assumed all principal curvatures have the same sign. Com- 
bining with (3.20) and (3.21) yields bounds for Ra terms, and (3.19) may 
be used to bound the R^l terms. The results are listed in Appendix B. 

Remark. The calculation (3.25) for HcujUf requires that all principal 
curvatures have the same sign for the inequality to hold. This requirement 
will carry through as an assumption in the statement of Theorem 4.1 (Main 
Result 1). To avoid this requirement we must work with moments of rather 
than with norms, specifically when computing CL^ and LC"^ terms. While 
not possible here, it is possible to do so as demonstrated in Main Result 2 
(see Section 4.3.2). Despite this assumption, it will be seen in Section 5 that 
this current analysis does in fact provide meaningful results for principal 
curvatures of mixed signs (except for when Ki = 0), indicating that tighter 
i?^^ bounds are possible. We note that Main Result 2 will require no such 
assumption and will hold for any value of K^. 
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3.2.2. Expectations. We are almost in position to define confidence in- 
tervals of the form (3.18), where A and B may be the centered matrices 
L, C, and E. All that remains is to compute the true expectation term of 
equation (3.17): 



(3.26) 



E[Uj{a-E[a]){b- 



As the coordinates of i and c are functions of points sampled uniformly from 
B'^{r) and e ~ Af{0,a'^lD), the expectation terms are zero for ci'^ , ie^, and 
ce^. Only the pure curvature (cc^) and pure noise (ee^) terms may have 
nonzero expectations and their calculations are given in Appendix B. We 
list here only the results. 



Pure Curvature Term: 



(3.27) E[cu,cl^]-E[cu,]E[cl^] 



N 



2(d + 2)2((i + 4) \N, 



D D 

^ [(d+i)i^i-irl 

i=d+l j=d+l 



We note that later, for the purpose of interpretation, we will replace this 
exact expectation with an upper bound. Using that E[£^£^] < IE[£^] and 



KiKj, we bound 



(3.28) E[cu,cl^]-E[cu,]E[c-, 



U2l 



F 2(d + 2)2(d + 4) 



N 



The nonzero expectation (3.27) and its bound (3.28) grow with N and in 
this way may be thought of as the source of the estimator bias (see the 
beginning discussion of Section 3). 



Pure Noise Terms: 

'cT^V^d if (p,g) = (l,l), 

a'^VD^ if (p,g) = (2,2), 

if (p,g) = (2,1). 

Note that unlike the nonzero expectation of the pure curvature term, the 
nonzero expectations of the pure noise terms {p = q) are constant and do 
not grow with A^. Thus they represent a noise floor rather than a source of 
estimator bias. 



Ef 



El 
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3.2.3. Norm Bounds. We may now use the right-hand side of the con- 
fidence interval (3.18) to bound the size of the perturbation norms. When 
considering noise terms, recah that we must condition on e G Jig- To aid 
in interpretation, we recall the rescaled notation r = rmax{N/NmaxY^'^ ■ As 
each curvature term c has norm roughly of size Xr^, we expect || -^CC"^ Hi? 
to grow as K^r^. Concentration of Gaussian measure indicates that the 
norm of the noise matrix will have size that depends on the square root of 
the projection dimension and the variance cr^. All other terms are zero in 
expectation and thus we expect 1/y/N decay. The linear-curvature, linear- 
noise, and curvature- noise matrices should have norm Kr'^/^/N., ar/^/N, 
and Kar"^ /^/N , respectively. The leading order behavior of all norm bounds 
may be found in Table 4.1 and the full expressions with associated proba- 
bilities are given in Appendix C. 



4. Optimal Scale Selection and Subspace Recovery. Our main 
result, a bound on the angle between the recovered and true tangent planes, 
is formulated in this section. First we use the triangle inequality to bound 
the norms appearing in Theorem 2.1. We then inject the perturbation norms 
computed in Section 3 and Appendix C to formulate the main result. 

We have: 



|C/fA[/i|L < 2 



|C/jAf/i||^< 



+ 



-CL' Ui 



-EE' Ui 



+ 



+ 2 



+ 



Ul^EE^Ui 



-CE' U2 



-EL' Ui 



+ 



+ 



Ui-EE'U2 



-CE' Ui 



As A is a symmetric matrix, we also have that HJ/^J^AC/iH^ = ||C/]^A[/2||p'- 
Using a standard union bound argument, the bounds for each term hold 
simultaneously with probability greater than 



(4.1) 1- 



-2e" 



-2e" 



over the joint random selection of the sample points and random realization 
of the noise. We may pick a constant rj and set 



(4.2) rjcc = rjee = mc = me = tlce = Vi = Vc = fje = V 
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such that (4.1) becomes 

(4.3) l-13e-'''-2e-«'. 
Finally, recall from Theorem 2.1 that 

(4.4) 6 = Xd- pfAUiW^ - \M^U2\\p . 

In order to bound the size of 5 we must compute A^, the dth eigenvalue 
of ^LL^. This matrix is a centered covariance matrix and therefore its 
dth eigenvalue is the variance in the dth coordinate. From our moment 
calculations in Appendix D, we let 



(4.5) 



Xd = Var[4] 



d + 2 \N, 



N 



4.1. Main Result: Bounding the Angle Between Subspaces. We are now 
in position to apply Theorem 2.1 and state our main result. First, define the 
following constants: 



Piiv) 



D D 

i=d+l j=d+l 

2 + 77^/2) + 

Vd + ie^\ , 



[2 + r/V2 



(vD-d+ee\/2) , 

N 



^ ^ ' max 



' ri 

Then we have our main result: 



+ 'yp?X^e)Kr. 

N 



2 

max 



N 



N„ 



+ ^Hp2iCe?+P3iCef) 



Theorem 4.1. (Main Result 1). Let the following conditions hold: 

• (Condition 1) = - ||?7f AC/i||^ - ||C/J'A[/2||^ > 0, 

• (Condition 2) \\U^AUi\\f < ^6. 
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For each i = (d + l), . . . , D, let principal curvatures have the same sign, 
n = 1, . . . , Then we have 

(4.6) 

2^ W + ^P2fe)f + <yp2{ie)r + <y^P2{ie)V^{ie)] 

with probability greater than 1 — 13e~''^ — 2e~'>^ over the joint random se- 
lection of the sample points and random realization of the noise, where the 
rescaled notation r = rmax{N /NmaxY^'^ has been used. 

Proof. Applying the norm bounds computed in Appendix C to Theorem 
2.1 and choosing the probabihty constants as given in (4.2) yields the result. 

□ 

The optimal scale N* may be selected as the for which (4.6) is min- 
imized. As will be analyzed in Section 4.2.1, (4.6) is either monotonically 
decreasing (for the curvature- free case), monotonically increasing (for the 
noise- free case), or decreasing at small scales and increasing at large scales 
(for the general case). We therefore expect a unique minimizer in all cases. 
Note that the constants ry and need to be selected to ensure that this 
bound holds with high probability. 

As discussed in Section 2, we may interpret as the condition number 
for tangent plane recovery, and we may analyze it by bounding (4.4) using 
the bounds for ||C/]^AC/i Hi;' and ||[/^AC/2||_p. We note that the denominator 
in (4.6) is a lower bound on 5 and we therefore analyze the condition number 
via this bound. From Main Result 1 (4.6), we see that when is small, 
we may recover a tight approximation to the true tangent space. Likewise, 
when becomes large, the angle between the computed and true subspaces 
becomes large. The notion of an angle loses meaning as tends to infinity, 
and we are unable to recover an approximating subspace. 

Condition 1 imposes that the spectrum corresponding to the linear sub- 
space {\d) must be well separated from the spectra of the noise (||C/iAC/i||ir) 
and curvature (||C/2^f^2||F) perturbations. In this way, condition 1 quanti- 
fies our requirement that there exists a scale such that the linear subspace 
is sufficiently decoupled from the effects of curvature and noise. When the 
spectra are not well separated, the angle between the subspaces becomes 
ill-de&ied. In this case, the approximating subspace contains an eigenvector 
corresponding to a direction orthogonal to the true tangent plane. In the 
language of [24], there is a crossover between the spectrum of the linear 



P -P 
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subspace and the spectrum of the perturbation, and we will observe a loss 
of tracking of the true tangent plane. Unlike the result of [24] where the 
crossover results from noise perturbation, we will demonstrate a crossover 
due to geometry in Section 5. As we will see, condition 1 indeed imposes a 
geometric requirement for tangent plane recovery. 

With condition 1 imposing restrictions on the effects of curvature and 
noise, condition 2 may be interpreted as a control on sampling. This condi- 
tion may be satisfied by increasing the sampling density, as such an increase 
allows N to take on large values. Recall that r = r^yiax 

{N/Nmaxf''. While 

(N/Nmax) terms are bounded by one, the numerator of (4.6) is composed 
of terms that behave as Thus, provided that the denominator is 

well-conditioned, condition 2 may be satisfied by allowing for large enough 
N. 

Before numerically demonstrating our main result, we give a practical 
interpretation of its conditions and implications. In doing so, we quantify 
the separation needed between the linear structure and the noise and curva- 
ture with a geometric uncertainty principle. Then in Section 4.3, we present 
evidence of a tighter main result through the Central Limit Theorem. 

4.2. Interpreting the Bound. The bound (4.6) is difficult to interpret 
due to many complicated terms. For subspace tracking, the behavior of the 
bound as a function of scale is of more practical use than the bound's actual 
value. The scale at which the bound reaches its minimum and the scale(s) at 
which it becomes ill-conditioned are the quantities of interest. Thus we now 
analyze our main result using more practical and interpret able, albeit less 
sharp, bounds. The following bound is not as sharp as Main Result 1 owing 
mainly to a less precise treatment of the principal curvatures. We replace 
equation (3.27) with (3.28) for the expectation of the pure curvature term, 
and we analyze only leading-order behavior of each term. Neglecting the 
probability-dependent constants and the contributions of the jjLE'^ and 
jj CE^ terms, we may write our main result as: 



Interpretable Main Result 1. 



(4.7) 



P-P 



2^/2- 



< 



f r3 + <j'^^d{D - d) 



K'^r^ 



and we recall that r 



d+2 
' max 

{N/N, 



jd+l) 
2(d+2)2(d+4) 



(j2 l^ + ^/D^ 



4.2.1. Revisiting the Noise-Curvature Trade-off Through the Condition 
Number. We may now use this more compact, yet less sharp, form (4.7) of 
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our main result to provide some interpretation for the bound (4.6). Consider 
first tlie denominator. Assume that sampling is sufficiently dense such that 
N may become large. Doing so ensures condition 2 is met and we may focus 
our attention on condition 1 (and our neglect of the j^LE^ and jqCE'^ 
terms in the denominator of (4.6) is also justified). The denominator of 

(4.7) is of the form 

(4.8) 5 = -;^-K\^—S^±^L_^-a'(Vd + VD^ 
^ ^ d + 2 2(d + 2)2(d + 4) V 

It is now easy to see that 6 quantifies the separation between the linear 
subspace (©(r^)) and the perturbation due to the curvature (O(i^^r^)) and 
the noise level {a'^{^/~d + \JD — d)). To approximate the appropriate linear 
subspace, we must at least have that 5 > as required by condition 1, 
and the approximation improves for larger 6. We note the similarity of this 
condition to that of equation (2.10) in [24]. 

The noise-curvature trade-off is now readily apparent. The linear and 
curvature contributions, controlled by r = rmax{N/Nmax)^^'^, are small for 
small values of A^. Thus for small, the denominator (4.8) is either neg- 
ative or ill conditioned for most values of a. This makes intuitive sense as 
we have not yet encountered much curvature but the linear structure has 
also not been explored. Therefore the noise dominates the early behavior 
of this bound and an approximating subspace may not be recovered from 
noise. As A^ increases, the conditioning of the denominator improves, and 
the bound is controlled by the 1 /^/N behavior of the numerator. This again 
corresponds with our intuition, as the addition of more points serves to over- 
come the effects of noise as the linear structure is explored. Thus, the bound 
becomes tighter. Eventually, A^ becomes large enough such that the cur- 
vature contribution approaches the size of the linear contribution, and S^^ 
becomes large. The 1/y/N term is overtaken by the ill conditioning of the 
denominator and the bound is forced to become large. The noise-curvature 
trade-off, seen analytically here in (4.8), will be demonstrated numerically 
in Section 5. 

4.2.2. Geometric Uncertainty Principle for Subspace Recovery. Impos- 
ing condition 1 on (4.8) yields a geometric uncertainty principle quantifying 
the amount of curvature and noise we may tolerate. Solving for the range of 
scales such that 5 > 0, the following condition naturally arises. To recover 
an approximating subspace, we must have that: 
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\ / \ / 




(a) (b) 

Fig 1. [color online] Illustration of the geometric uncertainty principle (4.9). For a man- 
ifold of fixed curvature K, (a) shows an acceptable noise level such that the geometry of 
the data remains intact and a tangent plane may he approximated from the noisy data, 
(h) illustrates a violation of the uncertainty principle as the manifold geometry may he 
destroyed by the noise. In this case a tangent plane approximation cannot he recovered. 



Geometric Uncertainty Principle 

(4.9) Ka < 



(d + 4) 



2(d+l)(^/d + /D^) 

By preventing curvature and noise from simultaneously becoming large, this 
requirement ensures that the geometry of the data is not destroyed by noise. 
With high probability, the noise concentrates on a sphere with mean cur- 
vature l/a^/D. Intuitively, we expect to require that the curvature of the 
manifold be less than the curvature of this noise-ball. Recalling the defini- 
tions of Ki and K from equations (3.4) and (3.5), Ki/d is the mean curvature 
in codimension i. The quadratic mean of the (D — d) mean curvatures is then 
given by K/dy/lD^ and we denote this normalized version of curvature as 
~K. Then (4.9) requires that ~K < 0{l/adDi). Noting that d > I > D~l , 
the uncertainty principle (4.9) indeed may be interpreted as a requirement 
that the mean curvature of the manifold be less than that of the perturbing 
noise-ball. Figure 1 provides an illustration. 

It is important to keep in mind that equation (4.9) is computed using the 
compact bound (4.7), and is thus meant for interpretation. For the precise 
expression represented by (4.9), the derivation must start with the full bound 
(4.6). 

4.3. Towards a Tighter Bound: Chasing the Constants. Thus far we have 
presented a rigorous analysis bounding the norm of each perturbation term. 
The analysis captures leading order behavior with high probability by utiliz- 
ing Theorem 3.1, but does so at the cost of attaching large constants to each 
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term. Theorem 3.1, a result derived from the bounded difference and Hoeffd- 
ing inequalities [28], introduces constants based on suprema of functions of 
random variables taken over the support of their distributions. Accordingly, 
each perturbation term is shown to deviate from its expectation by factors 
larger than constant multiples of its variance. 

In this section we use the Central Limit Theorem (CLT) to show that 
the variance of the perturbation terms controls the deviation from their 
expectations. Doing so yields tighter bounds for each term and a tighter 
overall main result. Further, by working with moments of the underlying 
random variables rather than norms, a more precise treatment of curvature 
terms is possible, allowing a relaxation of the assumption in Main Result 1 
that all principal curvatures in codimension i have the same sign. Despite 
the fact that our analysis is most often to be applied to sample sizes on 
the order of iV = 10^ or 10^, we must acknowledge that the sample means 
with which we work have a Gaussian distribution only in the limit as N 
tends to infinity. This finite-sample analysis can be made rigorous through 
the use of Bernstein-type inequalities and concentration of measure (in fact 
such approaches yield only sightly larger constants). However, we proceed 
with a CLT-based analysis, treating convergence in distribution as equality, 
to provide evidence that a tight bound may be achieved and to motivate 
future analyses that may rigorously yield tighter constants. 

4.3.1. Central Limit Theorem and Gaussian Tail Bounds. As previously 
seen, each entry of a matrix jjAB'^ is a sample mean of N i.i.d. random 
variables. The CLT and a Gaussian tail bound yield a confidence interval 
for such an entry. Using a union bound to simultaneously control all of the 
entries of this matrix, we may give an overall confidence interval for the 
value of its Frobenius norm. While such an analysis yields a tighter result 
than that using Theorem 3.1, it holds with lower probability due to the use 
of many union bounds. 

Remark. It is important to note that the probability attached to this 
analysis corresponds to the result serving as an upper bound for the true 
subspace recovery error. The probability is increased by choosing large values 
for the constants appearing in the exponential terms. This in turn loosens the 
result so as to bound any random fluctuation of the true error from above. In 
many practical applications, we are most interested in tracking the recovery 
error regardless of whether we guarantee an upper bound. When the need 
for an upper bound is relaxed, we will demonstrate in Section 5 that the 
leading order behavior of the following analysis tightly tracks the trend of 
the true error curve. 
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The analysis proceeds as follows. An entry of matrix j^AB^ has the form 

(4.10) (n^^^) = ^^E^^'-^^j^ - nA^nB,], 

^ ^ k=i 

where A and B represent the matrices L, C, and E from equation (1.7). We 
use the CLT to assert that for the i.i.d. random variables {Ai^^Bj^k) with 
mean jj, and variance u^, A; = 1, . . . , iV, and for large A^, 



(4.11) 



k=i ^ ^ 



Let Y ^ M [ii^ cr^) . Then we have 
(4.12) Prob{|y-/x| >e} <exp 



e 

'2^2/ 



1 



and we may pick e ~ 0{a) to ensure that |y — ;u| < e with high enough 
probability. Setting e = 1^(7^/2 gives 



Y e 



\/2, ^ + r?o-\/2 



with probability greater than 1 — e~'^ , and we have a confidence interval 
that depends on the variance of Y and whose probability is controlled by 
the constant ij > 0. 

We compute the size of the entries in A and detail the norm bounds in 
Appendix E. The main result is now stated after defining some constants. 

4.3.2. A Tighter Main Result. Just as we needed notation for the sake 
of the readability of Main Result 1 , we now define new constants for Main 
Result 2: 



CCi 



id 



i\.m.r 



id + 2){d + 4) 



1 



VCCi ^nn 



' 48(d+ l)(4d+ 17) 
(d + 4)2((i + 6)(d + 8) 



+Aricc2Kiir. 



(d2 + 5d + 3) 



+ 



4.r]cKiKj d + l 



(d + 4)2(d + 6)(d + 8) {d + 2) \ d + A 



1 Ar]lKiKj{d+l] 
N (d + 2)(d + 4) 



y(d + 4)(d + 6) ^ id + 2 



V(d + 4)(d + 6) 
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1 2r]LVc Id + l 



N{d + 2)\J d + A 



C£, 



££{x) = o^\fx 
Additionally, we will need 

(D D 
i=d+l j=d+l 
d D 



2 (7]EEiV2 + r]EE2Vx-l) + ^ril (l + Vx - l) 



K" 



i=l j=d+l 



^ D \ 2 

\i=d+i / 



L' = y^d{D - d) 
E' = y/d{D - d) 



VLE + VLVE 



VEE2 + Ve 



and 



/ 2(D-d) 2 



+ 2L ad\ I - — -r^ 



+ 88(d) +££{D-d). 



After using the triangle inequality in the exact same manner as in the begin- 
ning of Section 4, combining the norm bounds (computed in Appendix E) 
with Theorem 2.1 yields a new main result. Assume that conditions 1 and 
2 hold. Then using the rescaled notation r = rmax (N/Nmax)^^'^ , we have: 



Main Result 2. 

(4.13) 
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with probability greater than 
1-{D- df 

- d{D -d) {d- l)e~''^c7i + e'^'^c^a - dDe-'^tE _ d{D - d)e-'^cE 



2 



^ -„2 - D _„2 
2)g IE El g 'IEE2 



- de-^'l - de-'^c - De-'^E 

over the joint random selection of the sample points and random realization 
of the noise. 

The comments following Main Result 1 apply here as well. In particular, 
conditions 1 and 2 have the same interpretation and the denominator 5 
controls the conditioning of the recovery problem. Further, we note that the 
leading order behavior of the perturbation norms has not changed. Main 
Result 2 exhibits the same behavior as Main Result 1 , but provides a tighter 
tracking of the true subspace recovery error, as will be shown in Section 5. 
Thus this analysis gives rise to the same interpretable bound as (4.7), up to 
multiplicative constants. The same geometric uncertainty principle applies 
as well. 

Table 4.1 shows the leading order behavior for each perturbation norm. 

1 

Note that rmax {N / Nmax) has been replaced by r and only the leading 
order in d is shown. This side-by-side comparison reveals the reasons why 
Main Result 2 is a tighter bound. As suprema terms are replaced by variance 
terms, the CLT result introduces powers of l/d that reduce the size of many 
norms. Additionally, the approach of Main Result 2 allows for a more precise 
treatment of the principal curvatures, most importantly for the CL^ term. 
This precise treatment allows Main Result 2 to relax the assumption that 
all principal curvatures in codimension i have the same sign, as required 
by Main Result 1. Finally, notice that Theorem 3.1 introduces probability 
constants of the form (2 -|- ?7\/2), whereas the CLT introduces probability 
constants of the form r/\/2. Thus the CLT yields tighter bounds. 

4.4. Consistency with Previously Established Results. In [29], Singer and 
Wu study local PCA for tangent plane recovery in the absence of noise. 
The covariance matrix is decomposed following the assumption that for a 
given r, the number of points in a ball of radius r is large, implying a 
model with variable density. Fixing a density allows us to translate their 
results to our model. Then the covariance matrix decomposition yields error 
terms corresponding to curvature of size 0{r'^) and finite-sample variance of 
sizes 0{r^)/VN, 0{r^)/VN, and 0{r^)/VN. Recalling that ||^CC^||f ~ 
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Norm 


Main Result 1 (top) / Main Result 2 (bottom) 


WU^CC^U^Wf 


2d3 


E E [id+i)Ki^„-K:i„]^ 

i=d+l j=d+l 


i 

2 




2d3 


E E [{d+i)Kii-Kii„f 

i=d+l j=d+l 


2 


\\UTEE^Ui\\f 






\\Ui'EE^U2\\F 






WU^EE'^UiWf 


^ y/d{D-d)+^eV2(Vd+VD-d) +2^1 {2 + neeV2} 


-^y/2d{D-d)r,EE, 


\\Ui'CL'^Ui\\F 


-l^r^K{2 + r„.V2) 


1 /s" 
Vn dVdy 2 


i 

E E (Kj{vlc, + 'Hi) + k^\vlc,V5-Vlc,)Y ' 
i=i j=d+i V y 


\\UTEL^Ui\\f 


(^Vd + ^eV2) (2 + neeV2} 


-;^rV2di]LE 


WUIEL^UiWf 


f^T {^D-d + CeV2) (2 + r]ee^/2) 


^r^2iD-d)vLE 


\\UICE''Ui\\f 


:^^K(^Vd + ieV2) (2 + r,ceV2} 


i=d-\-l 


( VCE ./QSrii 1 ffii 1 R' l 
VV2(<*+4) V2('*+2) / 


\\U^CE''U2\\f 


:^^K{y/D-d + ieV2) (2 + 77ee\/2) 


Vn' 




' D / \ 2] 2 





Table 4.1 

Comparison of leading order perturbation terms for Main Result 1 ( top ) and Main Result 
2 (bottom). Notationally, r-max (N/Nmax)^ has been replaced by r and only leading order 

d terms are shown. 
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0{r^) + 0{r^)/^/N and || ^^CL-^Hj? ~ 0{r^)/^/N , our analysis recovers the 
same leading order behavior as reported by Singer and Wu in the noise-free 
setting. 

As previously discussed, Nadler presents a finite-sample PCA analysis in 
[24] assuming a linear model. Setting curvature terms in Main Results 1 
and 2 to zero recovers Nadler's leading order bound on the angle between 
the finite-sample eigenvector (s) and the true eigenvector (s). In our notation, 
Nadler reports that, to leading order, the angle is bounded by 

where d is taken to be one. We now show that our results recover this leading 
order behavior. First, set all curvature terms to zero. Next, assume condition 
1 holds such that the denominator 5 is sufficiently well conditioned and we 
may neglect all terms other than A^. Using the more compact notation of r 
in place of Vmax (N/Nmax)'^ and following the approach in [24] of retaining 
only leading order terms and dropping probability constants, we have 

si'^^i.f/i ^ ^2 = r 7^ + ^(^ )■ 



r \/ N 

Taking d = \ and noticing that \/A^ ~ r yields Nadler's bound. 

5. Numerical Results. In this section we demonstrate that the bounds 
of Main Results 1 and 2 accurately and efficiently track the true subspace 
recovery error and may therefore be used to obtain the optimal scale for 
tangent plane recovery. We then address the case of data sampled from a 
saddle (such that the principal curvatures are of mixed signs) that brings to 
light a particular difference between the two main results. We explain and 
numerically demonstrate the behavior of the true subspace recovery error 
at large scales and the corresponding lack of subspace tracking and connect 
this observation to the "crossover phenomenon" reported in [24] . Finally we 
demonstrate the accurate and efficient recovery of local curvature. 

5.1. Subspace Tracking and Recovery. We generate a data set sampled 
from a 3-dimensional manifold embedded in M."^^ according to the local model 
(1.2). The radius of the local model is set to rmax = 1 and iV = 1.25 x 10^ 
points are uniformly sampled from the tangent plane. Note that 2^*^ ~ 10^, 
and thus this choice of N represents reasonable sampling in M^". Curvature 
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and the standard deviation a of the added Gaussian noise wih be specified 
in each experiment. 

We compare our bounds with the true tangent plane recovery error. The 
tangent plane at reference point xq is computed at each scale via PCA 
of the nearest neighbors of xq. The true tangent plane recovery error 
||P — is then computed at each scale. Note that computing the true 
error requires N SVDs. A "true bound" is computed by applying Theorem 
2.1 after measuring each perturbation norm directly from the data. While 
no SVDs are required, this true bound utilizes information that is not prac- 
tically available, and therefore represents the best possible bound that we 
can hope to achieve. We will compare the mean of the true error and mean 
of the true bound over 10 trials (with error bars indicating one standard 
deviation) to the following three curves: 

1. Main Result 1 holding with probability 0.5 (magenta), 

2. Main Result 2 holding with probability 0.5 (black), and 

3. Main Result 2 with all probability constants set to 1 (green). 





i = 1 


i = 2 


i = 3 


i = 4,...,6 


3.0000 


1.5000 


1.5000 


j = 7,...,20 


1.6351 


0.1351 


0.1351 



Table 5.1 

Principal curvatures of the manifold for Figure 2-c. 

The third curve abandons any guarantee of providing an upper bound in 
favor of capturing the trend of the true error. This curve represents the case 
where we may not care to upper bound the error, but instead wish to track 
the height of the true error curve as closely as possible. We will refer to these 
curves as "Main Result 1," "Main Result 2," and "Main Result 2-trend." 

The results are displayed in figure 2. Panel (a) shows the noisy {a = 
0.01) curvature- free (linear subspace) result. As the only perturbation is 
due to noise, we expect the error to decay as 1/VN as the scale increases. 
The curves are shown on a logarithmic scale (for the Y-axis) and decrease 
monotonically, indicating the expected decay. As expected. Main Result 2 
(black) is tighter than Main Result 1 (magenta) and both accurately track 
the behavior of the true error (blue). Main Result 2-trend (green) tightly 
tracks the same behavior, in this case sitting on top of the true bound 
(red). Panel (b) shows the results for a noise-free manifold with principal 
curvatures given in Table 5.1 such that K = 12.6025. Notice that three 
of the codimensions experience high curvature while the others are flatter, 
giving a tube-like structure to the manifold. In this case, perturbation is 
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due to curvature only and the error increases monotonically (ignoring the 
sUght numerical instability at extremely small scales), as predicted in the 
discussion of Section 4.2.1. Eventually, a scale is reached at which there is 
too much curvature and the bounds blow up to infinity. This corresponds 
exactly to where the true error plateaus at its maximum value, representing 
the fact that the computed subspace is now orthogonal to the true tangent 
plane. This large scale behavior will be further explained in Section 5.3. 

Figure 2-c shows the results for a noisy (cr = 0.01) version of the manifold 
used in panel (b). Note that the true error is large at small scales due to 
noise and large at large scales due to curvature. At these scales the bounds 
are accordingly ill conditioned and track the behavior of the true error when 
well conditioned. Panel (d) shows the results for a manifold again with K = 
12.6025, but with the principal curvatures equal in all codimensions {k'P = 
1.0189 for z = 1, . . . , 3 and j = 4, . . . , 20), and noise {a = 0.01) is added. 
We observe the same general behavior as seen in panel (c), but both the 
true error and the bounds remain well conditioned at larger scales. This is 
explained by the fact that higher curvature is encountered at smaller scales 
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— 'I'rue bouiid 

— Main Result 1 

— Main Result 2 

— Main Result 2-trcnd 



True bound 
True ei 
Main Result I 
Main Result 2 
Main Result 2 -trend 



I 




N 
(c) 



N 
(d) 



Fig 2. [color online] Norm of the perturbation: (a) flat manifold with noise, (b) curved 
(tube-like) manifold with no noise, (c) curved (tube-like) manifold with noise, (d) curved 
manifold with noise. Black dots indicate minima of the curves. Note the logarithmic scale 
on the Y-axes. See text for discussion. 

for the manifold corresponding to panel (c) but is not encountered until 
larger scales in panel (d). 

In all four plots, the bounds accurately track the behavior of the true 
error. In fact, the curves are shown to be parallel on a logarithmic scale, 
indicating that they differ only by multiplicative constants. Note also that 
the true bound (red) tightly tracks the true error (blue) , providing evidence 
that the triangle inequalities used in computing the bounds are reasonably 
tight. As no matrix decompositions are needed to compute our bounds, we 
have efficiently tracked the tangent plane recovery error. The black dots in 
figure 2 indicate the minimum of each curve. In general we see agreement of 
the location at which the minima occur, indicating the scale that will yield 
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the optimal tangent plane approximation. We note that when the location 
of the bounds' minima do not correspond with the minimum of the true 
error (such as in panel (d)), the discrepancy occurs at a range of scales for 
which the true error is quite flat. In fact, in panel (d), the difference between 
the error at the computed optimal scale and the error at the true optimal 
scale is on the order of 10^^. Thus the angle between the computed and 
true tangent planes will be less than half of a degree. For a large data set 
it is impractical to examine every scale and one would instead most likely 
use a coarse sampling of scales. The true optimal scale would almost surely 
be missed by such a coarse sampling scheme. Our analysis indicates that 
despite missing the true optimum, we may recover a scale that yields an 
approximation to within a fraction of a degree of the optimum. 

5.2. Principal Curvatures of Mixed Signs (Saddle). As discussed in Sec- 
tion 3.2.1, a key difference between Main Results 1 and 2 is the ability of 
Main Result 2 to properly handle principal curvatures of mixed signs. Main 
Result 1 requires the assumption that all principal curvatures in codimension 
i have the same sign and thus cannot properly track the tangent plane recov- 
ery error for points sampled from a saddle. This is demonstrated in figure 3- 
a, showing the results for a 2-dimensional noise-free saddle {d = 2, D = 3) 

(3) (3) 

with principal curvatures k\ =3, and ^2 = —3. While all other curves 
behave as expected, the curve corresponding to Main Result 1 is identically 
zero because K = 3 — 3 = 0. Main Result 2, through its use of Knn and 
Kmn, avoids this problem. Figure 3-b shows the results for a 2-dimensional 

(3) 

noise-free saddle {d = 2, D = 3) with principal curvatures k\ =4, and 

(3) 

^2 = —3. Despite the fact that the assumption of Main Result 1 is violated 
(the principal curvatures are of mixed signs) , the corresponding curve does 
in fact track the recovery error because i^ = 4 — 3 = lis not identically zero. 
The fact that the proper behavior is seen despite the violated assumption 
indicates that a tighter curvature analysis in Section 3.2.1 is possible. 

5.3. Spectral Crossover at Large Scales. Here we examine the inability 
to track the proper subspace at large scales, which is clearly indicated by 
the ill conditioning of the bounds and the plateau of the true error at its 
maximum value in figures 2 and 3. We demonstrate that this is an effect of 
curvature. 

In [24] it was shown that the PCA of a noisy linear subspace is prone to 
a "sudden loss of tracking" of the dominant eigenvector. This occurs when 
an eigenvalue corresponding to noise overtakes an eigenvalue corresponding 
to signal. In this setting, once the crossover has occurred, the dominant 
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N N 



(a) (b) 

Fig 3. [color online] A 2- dimensional saddle (noise free) is shown with (a) K = and (b) 
K = 1. Note that Main Result 1 is identically zero in (a) but accurately tracks the true 
error in (b). See text for discussion. 



eigenvector may point in any random direction. Consider now our geometric 
model and let the sample points be noise-free to demonstrate a similar phe- 
nomenon owing to geometry rather than noise. Recall that, in this setup, 
condition 1 requires there be sufficient separation between the spectrum of 
the linear structure and the spectrum of the curvature. Also recall that 
is the corresponding condition number. When becomes large, there is 
little separation of the spectra, and a curvature eigenvalue approaches a tan- 
gent plane eigenvalue. Once the curvature eigenvalue becomes larger than 
the tangent plane eigenvalue, the computed eigenspace contains a direction 
orthogonal to the true subspace. This is seen in figure 2-b and figure 3 
where the bounds blow up to infinity and the true error plateaus at its max- 
imum value indicating orthogonality. As the crossover is due to curvature, 
an eigenvector in a direction orthogonal to the true tangent plane is intro- 
duced into the top d computed eigenvectors. Thus the computed and true 
tangent planes are orthogonal at large scales. 

Numerical evidence of this phenomenon is given in figure 4. The eigenval- 
ues (mean over 10 trials) corresponding to the saddle from figure 3-b {d = 2, 
D = 3, k\ ' = 4, 

= —3) are plotted function of scale. At small 
scales, the two tangent plane eigenvalues (blue and red) dominate the cur- 
vature eigenvalue (green) and the subspace recovery error is well conditioned 
at small scales in figure 3-b. Notice that at roughly = 2500, the curvature 
eigenvalue crosses the two tangent plane eigenvalues. After this scale, the 
largest (blue) eigenvalue corresponds to curvature but is now included in 
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5000 



N 



Fig 4. [color online] The eigenvalues computed from the saddle in figure 3b are plotted as a 
function of scale. Note the crossover between the curvature and tangent plane eigenvalues 
at roughly N = 2500, corresponding to the lack of subspace tracking at the same scale in 
figure 3b. 

the computed tangent plane. Thus the computed tangent plane contains a 
direction orthogonal to the true tangent plane. This is seen in figure 3-b 
as the true error (blue) plateaus at its maximum value. At this large scale 
the bounds become ill conditioned (or negative) as condition 1 ((^ > 0) is 
violated. As there is no noise in this example, the crossover phenomenon is 
similar to that reported in [24], but is the result of curvature at large scales 
rather than noise. 

5.4. Recovering Neighborhood Curvature. The expectation of the curva- 
ture term in each codimension has the following form: 



Thus given data in the coordinate system aligned with the principal direc- 
tions described in Section 1.2, we may track the trajectory of the center as 
a function of scale and compare it to (5.1) to recover an estimate of Ki for 
i = (d + 1), . . . , Z). Table 5.2 shows that this procedure results in a very 
accurate recovery of the local curvature at low noise levels, and the recovery 
becomes worse as the noise increases and the limit of the geometric uncer- 
tainty principle (4.9) is approached. We note that accuracy improves as N 
becomes large, as expected by the CLT. Note that using (5.1), we recover the 
individual KiS from which the overall K is computed (by equation (3.5)) 
and is reported in the tables. While this method does recover each Ki, the 
individual principal curvatures not recovered. As it does not 
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K = 2 


K =10 


K = 20 


a = 


1.9989 ±0.0038 


10.0079 ±0.0145 


20.0047 ± 0.0476 


a = 0.005 


2.0020 ± 0.0032 


10.0096 ±0.0251 


19.9903 ± 0.0406 


a = 0.01 


2.0005 ± 0.0048 


9.9952 ±0.0202 


20.0051 ± 0.0478 


a = 0.025 


1.9928 ± 0.0044 


9.9516 ±0.0279 


19.9104 ± 0.0428 


a = 0.1 


1.8877 ±0.0705 


8.8781 ± 0.0808 


17.8829 ± 0.0949 



Table 5.2 

Estimation of curvature at different noise levels (d = 5, D = 20, N = lO"*;. The mean 
and standard deviation are reported from 10 trials. The estimation is accurate for low 
levels of noise and loses accuracy as the noise level increases. Note that the individual 
Ki's are recovered from which the overall K is computed according to equation (3.5). 

require matrix decompositions and only uses vector addition, this method is 
computationally efficient. If one is willing to perform N SVDs, this method 
combined with the analysis of [4] might yield the individual principal cur- 
vatures. 

While it is unrealistic for data to be observed in the desired coordinate 
system aligned with the principal directions, tracking the trajectory of the 
center in each dimension yields the rotation necessary to transform to this 
coordinate system. Further, tracking the trajectory may yield a clean es- 
timate of the reference point of the local model in the presence of noise. 
While the noise renders this trajectory unstable at small scales, it is very 
stable at scales above the noise level. Using the stability of the trajectory 
at large scales may allow us to extrapolate back and accurately recover the 
trajectory at small scales, yielding an estimate of the "denoised" reference 
point. 

6. Algorithmic Considerations and Future Work. There are sev- 
eral algorithmic issues to be considered in implementing this analysis for 
optimal tangent plane recovery. Such considerations are topics of our cur- 
rent research and we give a brief discussion here. 

6.1. Parameter Recovery. In any practical use of this analysis (and in 
keeping with its spirit), each of the parameters d, rmax, K, and a must be 
recovered from the data itself rather than estimated by an a priori fixed 
value. 

d. There exist algorithms for estimating the (local) intrinsic dimension- 
ality of a data set. The recent work in [4] presents a multiscale approach to 
estimate d in a pointwise fashion. Performing an SVD at each scale, d is 
determined by examining growth rate of the multiscale singular values. It 
would be interesting to investigate if this approach remains robust if only 
a coarse exploration of the scales is performed, as it may be possible to 



OPTIMAL TANGENT PLANE RECOVERY 



35 



reduce the computational cost through an SVD-update scheme. Another 
scale-based approach is presented in [32] and the problem was studied from 
a dynamical systems perspective in [9] . 

Tmax- The maximum radius for which the local model (1.2) is valid may 
be estimated by a multiscale partitioning of the data set. Partitioning from 
fine to coarse, regions that produce similar tangent plane estimates at the 
same scale may be merged. Such an approach is similar to the aggregation 
process in [18], hierarchical clustering [5], data partitioning to find affine 
subspaces ("flats") [31], subspace arrangement for homogeneous data subsets 
[20], and spectral clustering [1]. 

K. We have demonstrated our ability to recover K given data in the 
coordinate system described in Section 1.2. Additionally we have discussed 
how tracking the trajectory of the centering may yield both the rotation into 
the desired coordinate system as well as a clean estimate of the otherwise 
noisy reference point. The accuracy and stability of such a scheme remains 
to be tested and it will be interesting to investigate if this may be a path to 
a simpler recovery of the tangent plane. 

It is worth mentioning that while the definitions of K and Ki used in 
this work arise naturally from the analysis, they are not the only possible 
definitions. One could define 



This definition does not match the calculation for E[Cj] but has the advan- 
tage of handling negative principal curvatures in a more natural manner. 
Indeed, we have seen that Main Result 1 does not hold for the case of a 
saddle with = 0, but in fact does hold for a saddle with iTj 7^ 0, thereby 
indicating that this bound may hold in a more general context. 

a. There exist many statistical methods for estimating the noise level 
present in a data set (see, for example, [3, 7]). In [4], the smallest multi- 
scale singular values are used as an estimate for the noise level and a scale- 
dependent estimate of noise variance is suggested in [8] for curve-denoising. 

The parameters of our analysis may not remain constant over the entire 
data set. It is possible, if not likely, to experience very different sampling 
densities, noise-levels, curvature and dimensionality as one explores differ- 
ent regions of a data set. This fact increases the need for careful parameter 
selection and emphasizes the importance of a local analysis. Initial experi- 
ments indicate that Main Results 1 and 2 are sensitive to changes in these 
parameters. For example, over/under estimating K or a will result in ill 
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conditioning at smaller /larger scales than seen in the true error. In depth 
experimentation will be necessary to precisely quantify the robustness of the 
results to parameter perturbation. 

6.2. Sampling. For a tractable analysis, assumptions about sampling 
must be made. In this work we have assumed uniform sampling in the tan- 
gent plane. This is merely one choice and we have conducted initial experi- 
ments uniformly sampling the manifold rather than the tangent plane. Re- 
sults suggest that for a given radius, sampling the manifold yields a smaller 
curvature contribution than does sampling the tangent plane. While more 
rigorous analysis and experimentation is needed, it is clear that considera- 
tion must be given to the sampling assumptions of any practical algorithm. 

6.3. From Tangent Plane Recovery to Data Parameterization. The tan- 
gent plane recovered by our approach may not provide the best approxima- 
tion over the entire neighborhood from which it was derived. Depending on 
a user-defined error tolerance, a smaller or larger sized neighborhood may 
be parameterized by the local chart. If high accuracy is required, one might 
only parameterize a neighborhood of size N < N* to ensure the accuracy re- 
quirement is met. Similarly, if an application requires only modest accuracy, 
one may be able to parameterize a larger neighborhood than that given by 
N*. 

Finally, we may wish to use tangent planes recovered from different neigh- 
borhoods to construct a covering of a data set. There exist methods for align- 
ing local charts into a global coordinate system (for example [2, 27, 35], to 
name a few). Care should be taken to define neighborhoods such that a data 
set may be optimally covered. 

Acknowledgement: We thank Brendt Wohlberg for his initial contribu- 
tions to this work that bordered on coauthorship. 

Supplement to "Optimal Tangent Plane Recovery From Noisy 
Manifold Samples". Technical results and long calculations from the 
main text are presented as supplementary material. The supplement is or- 
ganized as follows. Section A details the construction of the set over 
which projections of a Gaussian noise vector are bounded. Section B presents 
bounds on suprema terms and calculations for expectations used in Main 
Result 1. The norm bounds and associated probabilities for Main Result 1 
are listed in Section C. Section D details moment calculations for coordinates 
uniformly sampled from the tangent plane. Finally, confidence intervals and 
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norm bounds for Main Result 2 are computed via the Central Limit Theorem 
in Section E. 

APPENDIX A: THE SET 

Here the set 0,e over which projections of the vector e have bounded norm 
is formally defined. Begin by recalling a standard result on the concentration 
of the Gaussian measure on M^, denoted by 7Ar. Consider a random vector 
drawn from the AA(0,(T^lAr) distribution. By the concentration of Gaussian 
measure [13], the set 

S = {xeR^: ^/N{l-e) < \\x\\/a < \/iV(l + e)} 

has measure ^ 

77v(5) > l-2e-^. 

This result states that the Gaussian measure of the set of points in that 
concentrate about the sphere of radius a\/N is extremely large. 
The sets 

rii = {xGM'^: ||x|| < cj\/(i(l + ei)} 

VL2 = {x eM.'^^'^ : \\x\\ < aVD - d{l + €2)} 

have Gaussian measure 

7d(ni) > 1 - 

-(D-d)el 

jD-d(.^2) > 1 - e 2 

We may define sets 

= {x e : \\U'[x\\ < aVd{l + ei)} 

= {x£R^ : \\U^x\\ < aVD - d{l + £3)} 

such that and O^/j are the preimages of ili and ^1,2, respectively, in M^. 
The Gaussian measures of the sets 0,i and Q2 are the pushforwards of the 
Gaussian measure in by the respective projections Ui and C/J: 

^d{^U2) = lD-d{^2) < e 2 , 

where denotes the complement of the set 
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Finally, define 

and a standard union bound argument yields 



del -(D-d)e^ 



7£>(^e) = 7d{^Ui U CIU2) = 7d{^Ui) + mi^Ui) < e 2 + e 2 



Set ex^Jdp. = £2^/{D-d)/2 = ^e- Then both 

||f/fe|| < cr (yd + ^eV2^ 
\\Uje\\ < a (VD-d + ^eV2^ 

hold on rie, and 
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APPENDIX B: SUPREMA AND EXPECTATIONS FOR MAIN 

RESULT 1 

B.l. Suprema and R^. Listed here are the suprema terms needed 
for the calculations leading to Main Result 1. The calculation is outlined in 
Section 3.2.1. 

First we have terms: 



Rf, ^ T'tnax 



N \~d 



1 o / N \^ 



*"£ — 9 max 1 A T 

Z \ I\ max 



Rl < a (^Vd + (,eV2j onn 
Rl<a(VD^ + ^eV2) onn 



Then using (3.19) we may bound the remaining R^: 



^ ^cc — ' max \ A T 

^ \-''^max 

1 / iV \ 3 



Rcl ^ ^^'''max 



2 max \ A r 
\ ^^max 



e-i 



Rfe < (VD-d + (eV2) ^ on n 



6) 



6) 



Rfe <(T^{Vd + ^eV2j [VD^ + (eV2j on Q 

Rll < (TTmax (]^^) ^ + ^e^) °^ ^ 

Rfe < 

max 

on 

2 

Rll < iKarl,, ' [Vd + CeV2) on n 

2 

Rl! < iKarl,, ' [s/D^d + ^eV^) on 0^. 
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B.2. Expectations. Here we detail the calculation of the expectations 
from Section 3.2.2. Each term is of the form 



E[U^{a-E[a]){b-E[b]yU, 



As only pure curvature (cc^) and pure noise (ee^) terms have nonzero ex- 
pectation, the calculations of all other terms are omitted. 



Pure Curvature Term: The expectation of the pure curvature term is 
computed as follows. Consider first {cu2C^2)id ~ ^i^i ^o^' ~ {d+^)^ . . . ,D. 
Then 



E[ciCj] 



(B.l) 

Next consider (c 



1 



W/?2n/ (i)«2 



4 



N 



4(d + 2)(d + 4) \Nma 

U2)i = Ci for i = {d + I), . . . ,D. Then 



(B.2) 



and we have 



E[ci]E[cj] 



4(d + 2) 



N \d 



N„ 



(B.3) 



E[c,,Z]-E[c„,]E[c^J 



N \3 



2((i + 2)2((i + 4) \N, 



D D 



E \id+y)icL-Kz. 

i=d+l j=d+l 



Pure Noise Terms: Using that the entries of e are i.i.d. random variables 
from the M{0, cr^) distribution, we have 
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E[e„^e^l-E[e„jE[e:^„ 



d d 

E I 



E E E[e.e,f 



F 
d 



T 



E nei? + E ne^e,Y 



' D D 

E E IE[e.e,] 

i=d+l j=d+l 



' D d 

E EE[eie,f 
,j=d+iji=i 



\ ( 



E IE[ef]2+ E IE[e.e,f 



if = (1,1), 



=d+l 



if = (2,2), 
if (p,g) = (2,l). 



APPENDIX C: NORM BOUNDS FOR MAIN RESULT 1 

The right-hand side of the confidence interval (3.18) is used to bound 
the size of the perturbation norms. When considering noise terms, recall 
that we must condition on e G ilg- Recalling the rescaled notation r = 
rmax (N/Nmax)^^'^ is helpful for interpretation. 



Curvature 



< 



2(d + 2)2(d + 4) \N, 



N \d 



D D 
i=d+l j=d+l 



(C.l) + 



N 4 



N \3 



1 



2 + T]ccV2) + — [2 + r/e\/2 ) 



A 2' 



with probability at least 1 — e "^^^ — e over the random selection of the 
sample points. 
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Noise 

\ur{j,EE^)u, 

(C.2) a^Vd + -^a^ (Vd + ^eV2y ^2 + r;eeV2) + ^ (2 + % ^2)' 



< 



2 2 t2 

with probability at least 1 — e~^«e — g^^e — 2e~*e over the random realization 
of the noise. 

(C.3) X (2 + r?ee\/2) + ;/= (2 + ^e\^) 



2 2 t2 

with probability at least 1 — e~''«='= — e"''^ — 2e~^e over the random realization 
of the noise. 

Q^EE^^^ Ui ^< -^a^ [Vd + eeV2) (yo^ + V2) 

(C.4) X (2 + ?7eeV2) + (2 + 77e V2) 



2 2 t2 

with probability at least 1 — e^^^e _ e~Ve _ 2e~*e over the random realization 
of the noise. 

Linear-Curvature Interaction 



< 



(C.5) 



1 K 



(2 + 7]ieV2^ +^(2 + % V2) (2 + r?c V2) 



AT 

with probability at least 1 — e~'^tc — e'''^t — e~^^ over the random selection 
of the sample points. 



Linear-Noise Interaction 



Ut ( -EL^ ] U, 



(C.6) 



1 / N \d 

< — ^arr 



max 1 , J 

N V iV„ 



Vd + ^eV2j 



(2 + %e V2) + (2 + % V2) (2 + r/e V2) 
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with probability at least 1 — e '^ee — e —e — 2e over the joint random 
selection of the sample points and random realization of the noise. 



N 



< 



1 



(C.7) 



N 

X 



avr. 



N 



(2 + ?7fe V2) + (2 + % V2) (2 + r?e V2) 



with probability at least 1 — e '"'ie — e ''i — e —2e over the joint random 
selection of the sample points and random realization of the noise. 



Curvature-Noise Interaction 



CE'^ 1 Ui 



< 



1 K 



N 2 



-(7T 



AT \-d 



(Vd + V2) 



(C.8) 



(2 + r/ce V2) + ^ (2 + r/c V2) (2 + rieV2^ 



2 2 2 (-2 

with probability at least 1 — e~'''='= — e~''<= — e~'^^ — 2e~*<= over the joint random 
selection of the sample points and random realization of the noise. Note that 

\\ur {j,Ec^) U2\\f = \\ul i^cE^) u^y. 



CE' U2 



< 



1 K 2 

err 

N 2 """"^ 



AT \i 



(C.9) 



(2 + r/ce V2) + ^ (2 + r/c V2) (2 + rieVfj 



2 2 2 t2 

with probability at least 1 — e~^=e — e"^^ — e""''^ — 2e~*e over the joint random 
selection of the sample points and random realization of the noise. 



APPENDIX D: MOMENT CALCULATIONS 

The following moment calculations will be used in the confidence interval 
calculations. Let Lj be the random variable that returns the ith coordinate 

of a point from B'^{r), randomly chosen according to a uniform distribution. 
Let X = [xi X2 ... Xd] and compute the following expectations with respect 
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to n, the uniform measure on 



JB<i{r) 



\\ / ^1 I dxi ... dxi-i dxi+i ... dxd dxi 



YO\{B''-\l)) 



f xnr' - xj)'-^ dxi 

J—r 



vol(5<^(r)) 

Similarly, 



Then we compute the following moments: 

' 3+2 



= Var[L,] 



E[LiLj] = Var[LiLj] = (^^2)(d+4) 

= Var[L?] = 

E[L,L2] = Var[LiL2i - S'"' 



E[L3] = Var[Lf] = 

^[-^i-^jJ - (,/+2)(rf+4) var[ivjiv^-j - (,/_2)^(>/-i)^(>/-(i)(d+8) 

(d+2)(d+4) Var[I.-J - (d_,_2)2(d+4)2(d+6)(d+8)- 



APPENDIX E: CENTRAL LIMIT THEOREM CALCULATIONS FOR 

MAIN RESULT 2 

Here we detail the Central Limit Theorem (CLT)-based calculations that 

are used for Main Result 2. In the following analysis we will write ^^Li ' 
Y meaning that the sum converges in distribution to the random variable 
Y, and we will then indicate the distribution from which Y is drawn. 
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E.l. Matrix Entries. 



E.1.1. Centering. Wc first compute the entries of the matrices represent- 
ing the centering terms ^[L], IE[C], and E[£J]. 

• Linear 

1 ^ 

(E.l) K[Li] = Are [fi-T, fi + T] 



k=l 



(E.2) 
(E.3) 



f^ = 



r = VL- 



1 



N 



d + 2 

with probabiUty greater than 1— e~''i, where Y ^ M ;^Var[Lj]) . 

• Curvature 



(E.4) 



1 ^ 



N 

1 
2 



k=\ 



N N ■ 



k=l 



k=l 



1 

2 L' 



nfy^ + ■■■ + nfYa 



G [Ai-r, li + V] 



(E.5) 
(E.6) 



r 



* ' maa; 



2 (d + 2) VA?", 



AT \3 



-'^J ' max 



2 (d + 2) VA?", 



0V V c/ + 4 



with probability greater than 1 
~AA(E[Lf],i^Var[L2]) for j = 1, . . . , d. 



de ^ where Yj 



Noise 



1 ^ 

(E.7) = j^T^^i^k = Y e 



k=l 



-VE 



aV2 aV2 



, VE- 
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with probability greater than 1 — e"^^^ where Y J\f [O, j^a^) ■ 

Combining these results, we will use the following centering terms: 

. i[a]i[c,] G [/x-r, /x + r] 

(E.8) 



4(d + 2)2 '""^ V^' 



(E.9) 



N V / ^VC /d+l ^ 4r/^, f d+l 



max 

2 



4{d + 2)2 '""^ V N, 
with probability > 1 — de'^c , 



2 2 2of 

'IE N ' 'IE N 



with probability > 



1 - 26-"^ 7^ j) 



l-e-'^E {i = j), 

E[Li]E[Cj] e [/x-r, /x + r] 

(E.IO) n = 



(E.ll) r = r7L 



iV \^ 



1 + 



2r?c /d + l 



AT V + 4 



with probability > 1 — e '^'i — de ^'c , 

E[Li]E[£;j] G [/x-r, /x + r] 

(E.12) /X = 



(E.13) 



1 2(7 

r = r]Lr)E — ^^=rn 



2 2 

with probability > 1 — e''^L — e~''^E , 

E[a]E[£;,] G [/x-r, /x + r] 

(E.14) n = 



N \3 



TV. 



(E.15) T = r]E 



1 di^i 



ViVV2(ci + 2) 



iV \3 



1 + 



2r/c d+l 



ViV V d + 4 
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2 2 

with probability > I — de~'^c — e~^E , 

E.1.2. Curvature. The entries of the pure curvature term CC'^ are com- 
puted as follows. Note that the curvature term is the only one for which the 
entries grow with N. 



(E.16) 
1 ^ 



k=l 

1^1. X 1 



^ 2^ 2 V 1 ^''^ <^ '^'^ ) 2 ' 

k=l 



N ^ d , N 



n=l k=l m,n=l k=l 

4 Z-,- " " " 4 Z-,- " 

n=l m,n=l 

G [/i - r, + r] 



(E.17) 



1 r^„^ / \ 3 . . . . 



4 • • * 



1 r'* f N \d 

■p ' max I ' ' 

~ ^/iV2^/2(d + 2)((i + 4) V^mc 



(E.18) 



X ( nr^r^ . 24(d+l)(4d+17) ■ 8(d2 + M + 3) \ 

X J^7?cci + 6)(d + 8) + "^ocK^n^ + G)(d + 8)) 

with probability greater than 1 — de ^(^^i — ^^^^^-^e ^c;c2^ where Z„ ~ 
Ar(E[L^], ^Var[L4]) and ~ Ar(E[L2L|], ^Var [L^L^]) , for m, n = 1, . . . , d. 
Subtracting i[Ci]i[Cj] from (E.16), 

(E.19) (ji^^^) ^ [^-r, /i + r] 
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(E.20) 

1 r^^^ f N \^ r j • j . 



(E.21) 



1 / TV \ 3 

^ ' max ' ' 



N4:{d + 2) \N, 



''max 



X Lcc / 48(d+l)(4d+17) (^^ + 5d + 3) 

i '/^^I'^^^y (d + 4)2(d + 6)((i + 8) ^'/^^2^'^"Y (d + 4)2(ci + 6)(d + 8) 



+ 



(d + 2) Vq! + 4 ^ (d + 2)(d + 4) 



with probability greater than \ — de ^cc^ _ '^I^Lilg ''cc2 — ''c . Note that 
only the lower right (D — d) x (D — d) entries are nonzero. 

E.1.3. Noise. A diagonal entry of the pure noise matrix EE"^ is the 
square of the norm of a vector in M.^ of Gaussian random variables. We 
could use the concentration of Gaussian measure to bound this norm (see 
Section A), but obtain a slightly tighter result using the CLT. Note that 
this norm neither grows nor decays with N, and its leading order term cr^ 
represents a noise- floor. An off-diagonal entry is the inner-product between 
two such vectors, and we therefore expect it to be small. Using a Bernstein- 
type inequality [21] to bound such entries yields the same leading behavior 
but with higher order terms, whereas using the CLT does not. Properties of 
the Wishart distribution could also be used [23] . Note that the off-diagonal 
terms tend to zero as grows. 



• Diagonal entry (i = j) 



(E.22) 

k=l k=l 

2 

with probability greater than 1 — e ''^^i , where 1^ ~ X^iX) that 
E[yfc] = 1, Var[Yfc.] = 2, and Z ~ N {¥.[Yk],j^\ai[Yk\). Subtracting 
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E[Ei]E[Ej] yields 
(E.23) 




with probability greater than 1 — e — e ^'s . 
• Off-diagonal entry {i ^ j) 



1 ^ 



fc=i 



—flEE2 ■ 



2^ 



with probability greater than 1 — e ^^^2 , where 
y - (E[£;i£;j], j^\ai[EiEj]). Subtracting yields 



(E.25) 



N 



-EE' 



a' 



with probability greater than 1 — e ''^^2 — 2e ''b. 
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E.1.4. Linear- Curvature Interaction. The entries of the hnear-curvature 
term are computed as follows. 

(E.26) 



1 ^ 



k=l 



k=l 



_ 1 

~ 2 

d 1 

^ 2 



TV 



AT 



(i) 



N 



N 



k=l 



k=l 



N 



Ak 



k=l 



n=l 



e [ii-T, ii + T] 

(E.27) ijl = 

(E.28) r = 



N]I 2(d + 2)(d + 4)(d + 6) 
d 

n=l 



AT \d 



with probability greater than 1 — (c? — l)e ^^^^ — e ^^^2^ where 
Z ~ AA(E[Lf],^Var[L3]), and Yn ~ AA(e[L,L2], ^Var[LiL2]) , for n = 

1, . . . , i — 1, i + 1, . . . d, and Subtracting E[Lj]E[Cj], 

1 



(E.29) 
(E.30) 11 = 

(E.31) r 



AT 



1 r„ 



iV V2(d + 2) 

V(f^ + 4)(d + 6) 



I V(t^ + 4)(d + 6) (c? + 2) 



+ K. 



1 2r/L?7c 



N{d + 2) V d + 4 
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2 



with probability greater than 1 — {d — l)e ^^^'^ — e '^^^2 — e "^l — de . 

E.1.5. Linear-Noise Interaction. An entry of the hnear-noise matrix may 
be shown to be a Lipschitz function of Gaussian variables on a set with large 
measure. One may show that on this set, such a function concentrates tightly 
about its expectation (see [21]). Using the CLT to compute the Lipschitz 
constant yields the same leading order behavior as directly applying the CLT 
to the entries, but results in higher order terms as well. Thus we proceed 
with the usual CLT calculation. 



(E.32) 

1 ^ d 



k=l 



a 2 f N \d a 2 / N \ ^ 



r]LE^=\ , , rmax ^ , 1lLE—?=\ , , r, 



1 



Vn\ d + 2'"'''^yN^aJ ' '^"ViVVrf + 2-"'"-^VA^m. 

with probability greater than l—e~'^LE^ where Y ^ J\f {K[LiEj\, -^Var[Lji?j]) . 
Subtracting ¥.[Li]%E^j\, 

(E.33) {^^LE^y & [^_r, /. + r] 



(E.34) n = Q 

a I 2 ^ N \d 

(E.35) r 



V2 

VLB + VLVE 



VivV d + 2 V^m« 

2 2 2 

with probability greater than 1 — e~^LE — e~'^L — e~^E , 

E.1.6. Curvature-Noise Interaction. The entries of the curvature-noise 
matrix may be shown to be Lipschitz functions over a large set and the 
same comment holds as in the linear-noise case. Directly applying the CLT 
to the entries of this matrix, we have 

1 ^ 

fc=i 



(E.37) ^ = 

(E.38) r = r]cB I- , ^ ^'""^ ( ^i— 1 \/3K^„ - K. 

ViV V2((i + 2)(d + 4) V 



1 crr^ / N \ d 



mn 



52 



D. N. KASLOVSKY AND F. G. MEYER 



with probability with probabiUty greater than 1 — e "^ce^ where Y ~ 
Af [ElCEj], jjYsiT[CiEj]) . Subtracting i[Ci]i[^j], 



(E.39) 



CE^ \ e [fi-r, fi + T] 



(E.40) /i = 
(E.41) r 



N 



N v^2(d + 2) \Nynax 
VCE 



iV V d + 4 



with probabihty greater than 1 — e ^ce - de "^c — e ^e. 

E.2. Norm Bounds. Recall the following constants, previously defined 
in Section 4.3 and restated here for clarity: 



CCi 



id + 2){d + 4) 
+Aricc2Kti 



1 



' 48(d+l)(4d + 17) 
(d + 4)2((i + 6)(d + 8) 



(d2 + 5d + 3) 



+ 



'i7]cK^Kj d + 1 



((i + 4)2(d + 6)(d + 8) {d + 2) y d + i 



+ 



1 Ar]lKiKj{d+l] 
N (d + 2)(d + 4) 



VLC- 



i\/3 



+ 



VL 



V(d + 4)((i + 6) {d + 2 



1 2r]LVc d + 1 
N {d + 2)\l d + A 



{j)V^{r]LC2\^ -r]LCi] 
V(d + 4)(d + 6) 



V + + ^7^^^ ^ + V dT4 



V2 (riEEiV2 + rjEEiVx - 1^ + "^^1 (l + Vx -1 



OPTIMAL TANGENT PLANE RECOVERY 



53 



We now use the confidence intervals computed above to bound each pertur- 
bation norm. 

• Curvature 



(E.42) 



( ^CC^ 1 U2 



1 r"" 

^ _ ' max 



p- 4(d + 2) \N, 



N 



D D 



i=d+lj=d+l 



with probabihty greater than 



1-{D- df 
Noise 



(E.43) 



1 



( j^EE^ ) 



< 



1 + ^ {riEE,V2 + rjEE^Vd-l^ + ^rfl (l + Vd-l 



with probability greater than 



(E.44) 



1 



with probability greater than 



1-{D- d)e-''-i - (£^^)(£^l^e-^-. - - d)e- 



(E.45) 



[/J ( j^EE^ ] Ui 



< a'^Jd(D-d)^ 
F~ VN 



^ 2 V2 

VEE2 + ??E 



2 



with probability greater than 1 — d{D — d)e ^^^2 — De 
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Linear-Curvature Interaction 



(E.46) 



( -CL' 1 C/i 



< 



N ^2{d + 2) \Nmax 



N 



d D 



i=l j=d+\ 



with probability greater than 
l-d{D- d) \{d - l)e"''^cri + Q-^lc^ 

Linear-Noise Interaction 



de - de ^c. 



(E.47) 



Ut ( ^EL^ 1 U, 



< 



a 



N \ d + 2 \ N„ 



N 



IJLE + VLVE 



V2 



with probabiHty greater than 1 — d^e '^i-^ - de — de 



(E.48) 



< 



Vd{D - d) 



— ^—r 



with probabihty greater than 1 — d^e ''i^b — de ^'^ — de 
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Curvature-Noise Interaction 



(E.49) 



< 



a 



d 



nV 2{d + 2) 



N 



N„ 



D 

i=d+l 



with probability at least 1 - d{D - d)e ^ce - de '^c - de 



(E.50) 



( j^CE^ ) U2 



< 



a 



D-d 



N 



with probability greater than 

1-{D- dfe-'^cE - de-'^c - [D - d)e-'^E . 



D N 
=d+l / 



The norm bounds are combined to yield Main Result 2 (equation (4.13)) 
and a union bound is used to establish its associated probability. 
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